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Abstract  -  Category  Theory  is  used  to  describe  a  category  of 
fusors.  The  category  is  formed  from  a  model  of  a  process  begin- 
ing  with  an  event  and  leading  to  the  final  labeling  of  the  event. 
Although  many  techniques  of  fusing  information  have  been  de¬ 
veloped  the  inherent  relationships  among  different  types  of  fusion 
techniques  (fusors)  have  not  yet  been  fully  explored.  In  this  paper, 
a  foundation  of  fusion  is  presented,  definitions  developed,  and  a 
method  of  measuring  the  peiformance  of  fusors  is  given.  Func¬ 
tionals  on  receiver  operating  characteristic  (ROC)  curves  are  de¬ 
veloped  to  form  a  partial  ordering  of  a  set  of  classifier  families. 
The  functional  also  induces  a  category  of  fusion  rules.  The  treat¬ 
ment  includes  a  proof  of  how  to  find  the  Bayes  optimal  classifier 
(or  Bayes  Optimal  fusor,  if  available)  from  a  ROC  curve. 

Keywords:  information  fusion,  fusors,  ROC,  ROC  curves,  Bayes 
Optimal. 

1  Introduction 

Information  fusion  is  a  rapidly  advancing  science.  Re¬ 
searchers  are  daily  adding  to  the  known  repertoire  of  fusion 
techniques  (fusion  mles).  An  agency  that  is  building  a  fu¬ 
sion  system  to  detect  or  identify  objects  is  bound  to  want  to 
get  the  best  possible  result  for  the  money  expended.  It  is 
with  this  goal  in  mind  that  we  need  a  way  to  compete  var¬ 
ious  fusion  mles  for  acquisition  purposes.  It  appears  that 
the  receiver  operating  characteristic  curves  (ROC  curves) 
that  can  be  developed  for  such  systems  under  test  conditions 
may  serve  well  in  this  regard.  We  will  demonstrate  the  de¬ 
velopment  of  a  functional  on  ROC  curves  which  will  allow 
us,  under  certain  assumptions  and  constraints,  to  compete 
classifiers,  fusors  (fusion  rules  with  a  constraint),  and  fu¬ 
sion  systems  in  order  to  choose  the  best  from  among  finitely 
many  competitors. 

2  Category  Theory  Preliminaries 

Category  theory  is  a  branch  of  mathematics  useful  for  de¬ 
termining  universal  properties  of  classes.  The  science  of 
information  fusion  does  not  yet  know  of  all  the  relation¬ 
ships  involved  between  the  classes  of  data  and  the  mappings 
from  one  type  of  data  to  another.  It  has  been  our  goal  to 
try  to  engage  the  community  to  think  in  terms  of  generali¬ 
ties  when  studying  fusion  processes  in  order  to  abstract  the 
processes  and  perhaps  gain  some  clarity  of  thought,  if  not 
genuine  insight.  I  have  drawn  upon  the  work  of  various 
authors  [1,  2,  3,  4] to  present  the  definitions. 


Definition  1  (Category)  A  category  C  consists  of  the  fol¬ 
lowing: 

Al.  A  collection  of  objects  denoted  Ob(C). 

A2.  A  collection  of  arrows  (maps)  denoted  Ar(C). 

A3.  Two  mappings,  called  Domain  (dom)  and  Codomain 
(cod),  which  assign  to  an  arrow  /  €  Ar(C)  a  domain 
and  codomain  from  the  objects  of  Ob(C).  Thus,  for 

arrow  /,  given  by  Oi - ^  O2  ,  dom(f)  =  0\  and 

cod(f)  =  02- 

A4.  A  mapping  assigning  each  object  O  €  Ob(C)  an 
unique  arrow  Iq  called  the  identity  arrow,  such  that 


and  such  that  for  any  existing  element,  x,  of  O,  we 
have  that 


A5.  A  map,  o  ,  called  composition,  A  x  A  — ^ ^  A  . 
Thus,  given  f,g  G  A  with  cod{f)  =  dom{g)  there 
exists  an  unique  h  G  A  such  that  h  =  g  o  f. 

Axioms  A3-A5  lead  to  the  associative  and  identity  rules: 

•  Associative  Rule.  Given  appropriately  defined  ar¬ 
rows  /,  g,  and  h  we  have  that 

if  og)oh  =  fo(g  oh). 

f 

•  Identity  Rule.  Given  arrows  A - ^  B  and 

B  — A  ,  then  there  exists  identity  arrow  1a  such 
that  Iao  g  =  gand  f  o1a  =  f. 

Definition  2  (Subcategory)  A  subcategory  B  of  A  is  a  cat¬ 
egory  whose  objects  are  some  of  the  objects  of  A  and  whose 
arrows  are  some  of  the  arrows  of  A,  such  that  for  each  ar¬ 
row  f  in  B,  domi^f)  and  codi^f)  are  in  Ob(B),  along  with 
each  composition  of  arrows,  and  an  identity  arrow  for  each 
element  ofOh(B). 
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A  category  of  interest  is  the  category  Set,  which  has  as 
objects  sets  and  arrows  all  total  functions,  with  composi¬ 
tion  of  functions  as  the  composition.  Clearly  this  construct 
has  identity  arrows  and  the  associative  rule  applies,  so  it 
is  indeed  a  category.  The  subcategories  of  interest  to  us 
are  subcategories  of  particular  types  of  data  sets,  denoted 
T>,  with  objects  similar  types  of  data  sets  and  arrows  only 
the  identity  arrows,  and  subcategories  of  particular  types  of 
feature  sets,  denoted  T,  with  objects  similar  types  of  fea¬ 
ture  sets,  and  arrows  only  the  identity  arrows.  The  objects 
and  arrows  of  these  categories  shall  correspond  to  a  par¬ 
ticular  sensor  system,  so  will  represent  all  of  the  possible 
data  (or  feature)  sets  that  can  be  generated  by  the  sensor- 
processor  system.  For  example,  the  data  generated  by  a 
particular  sensor  system  may  be  2x2  real-valued  matrices. 
In  this  case,  T)  =  idp,  id-p,  o)  represents  the  cat¬ 

egory  with  only  the  identities  as  arrows,  and  o  being  the 
usual  composition  of  functions. 

A  further  categorical  term  that  will  be  useful  is  that  of  a 
functor. 

Definition  3  (Functor)  A  functor  ^  between  two  cate¬ 
gories  A  and  B  is  a  pair  of  maps  (j?ob,  S^Ar) 

Ob(yl)  Ob(S) 

Ar(yl)  Av{B) 

such  that  jj  maps  Ob(A)  to  Ob(,B)  and  Ar(A)  to  Ay{B) 
while  preserving  the  associative  property  of  the  composi¬ 
tion  map  and  preserving  identity  maps. 


Definition  5  (Functor  Category  A^)  Given  categories  A 
and  B,  the  notation  A®  refers  to  the  category  of  all  functors 

B - ^  A  ■  This  category  has  all  such  functors  as 

objects  and  the  natural  transformations  between  them  as 
arrows. 

Definition  6  (Product  Category)  Let  represent  a 

finite  collection  of  data  set  categories.  Then  ut  -_1  2?*  is 
the  corresponding  product  category. 

3  Modelling  Fusion  within  the 
Event-Decision  Model 

Let  X  be  a  set  of  states  for  some  event,  and  T  C  ffi  be 
a  bounded  interval  of  time.  Interval  T  sorts  X  such  that 
we  call  E  C  X  X  T  an  event-state.  An  event-state  is  then 
comprised  of  event-state  elements,  e  =  {x,  t),  where  x  G  X 
and  t  G  T.  Thus  e  denotes  a  state  x  at  an  instant  of  time  t. 
Let  E  =  X  X  T,  be  the  set  of  all  event-states  for  an  event 
over  time  interval  T. 

The  following  discussion  can  be  expanded  to  a  finite 
number  of  sensors,  but  for  now  consider  the  simple  model 
of  a  multi-sensor  process  using  two  sensors  in  Figure  1. 
The  sets  Ep  for  i  G  {1,2}>  sets  of  event-states.  It 

El  Di  El  Li 


Ea  — ^  Da  Fa  La 

Fig.  1:  Simple  Model  of  a  Dual-Sensor  Process. 


Thus,  given  categories  A,  B  and  functor  ^  :  A - Q  ,  if 

A  G  Ob(A)  and  /,  g,  h,  1a  G  Ar(A)  such  that  f  o  g  =  h 
is  defined,  then  there  exists  B  G  Ob(,8)  and  f  ,g' ,  h' ,  1b  G 
Ar(B)  such  that 

0  )?(A)  =  B. 
ii)  S(.f)  =  /',  ^(ff)  =  g'- 
Hi)  h'  =  ^{h)  =  d{f  09)=  5'(/)  o  dig)  = 
f  °  g'- 

A)  )?(1a)  =  15(A)  =  Is- 

Definition  4  (Natural  Transformation)  Given  categories 

A  and  B  and  functors  and  0  with  A - ^  B  and 

A  — B  ,  then  a  Natural  Transformation  is  a  family 
of  arrows  v  =  {j^a|A  G  A}  such  that  for  each  f  G  Ar(A), 

f 

A - ^  A' ,  A'  G  A,  the  square 


^(A) 
Uf) 


.0(A) 

«(/) 


j?(A')  0(A') 


commutes.  We  then  say  the  arrows  are  the  components 

of  V  ■.  jj - ^  0  ,  and  call  v  the  natural  transformation 

of^  to  0. 


is  useful  to  think  of  E^  as  the  set  of  all  possible  states  of 
an  event  (such  as  an  aircraft  flying)  occurring  within  sen¬ 
sor  Si’s  field  of  view.  Given  E^  thus  defined,  now  define 
a  sensor  as  a  mapping  from  an  event-state  set  to  a  data  set. 
Hi.  The  mapping  Si  is  then  a  sensor.  A  data  set  could 
be  a  radar  signature  return  of  an  object,  multiple  radar  sig¬ 
nature  returns,  a  two-dimensional  image,  or  even  a  video 
stream  over  the  time  period  of  the  event-state  set,  for  ex¬ 
ample.  In  any  case  we  would  like  to  extract  recognizable 
features  from  the  data  set.  Hence,  mapping  pi  represents 
a  processor  which  does  just  that.  Processors  are  mappings 
from  data  sets  into  feature  sets,  Fp  Finally,  from  the  fea¬ 
ture  sets  we  want  to  determine  a  label  or  decision  based 
upon  the  sensed  event-state.  This  is  achieved  through  use 
of  the  classifiers  ci  which  map  the  feature  set  into  a  label 
set.  The  label  set  Li  can  be  as  simple  as  the  two-class  set 
{target,non-target}  or  could  have  a  more  complex  nature  to 
it,  such  as  the  types  of  targets  and  non-targets  in  order  to  de¬ 
fine  the  battlefield  more  clearly  for  the  warfighter.  Now  the 
diagram  in  Figure  1  represents  a  simple  sensor  process  pair 
involving  two  sensors,  two  processors,  and  two  classifiers, 
but  can  easily  be  extended  to  any  finite  number.  Now  con¬ 
sider  two  sensors  not  necessarily  co-located.  Hence  they 
may  sense  different  event-state  sets.  Figure  1  models  two 
sensors  with  differing  fields  of  view.  Performing  fusion 
along  any  node  or  edge  in  this  graph  will  result  in  an  el¬ 
evated  level  of  fusion  [5]-that  of  situation  refinement  or 


threat  refinement,  since  we  are  not  fusing  common  infor¬ 
mation  about  a  particular  object  or  objects.  There  are  two 
other  possible  scenarios  than  Figure  1  depicts.  The  sensors 
can  overlap  in  their  field  of  view,  either  partially  or  fully,  in 
which  case  fusing  the  information  regarding  object  event- 
states  within  the  intersection  may  be  useful.  Thus,  a  fusion 
process  may  be  used  to  increase  the  reliability  and  accuracy 
of  the  system,  above  that  which  is  possessed  by  either  of  the 
sensors  on  its  own.  Let  E  represent  that  event-state  set  that 
is  common  to  both  sensors,  that  is,  E  =  Ei  n  E2.  Hence, 
there  are  two  basic  challenges  regarding  fusion.  The  first 
is  how  to  fuse  information  from  multiple  sources  regarding 
common  event-states  (or  targets,  if  preferred)  for  the  pur¬ 
pose  of  knowing  the  event-state  (presumably  for  the  pur¬ 
poses  of  tracking,  identifying,  and  estimating  future  event- 
states).  The  second  and  much  more  challenging  problem  is 
to  fuse  information  from  multiple  sources  regarding  event- 
states  not  common  to  all  sensors,  for  the  purpose  of  know¬ 
ing  the  state  of  a  situation  (the  situation-state),  such  as  an 
enemy  situation  or  threat  assessment.  We  label  the  two 
types  of  fusion  scenarios  discussed  event-state  fusion  and 
situation-state  fusion.  Therefore,  Figure  2  represents  the 
Event-State-Decision  model  of  a  dual  sensor  process.  The 


Fig.  2:  Dual  Sensor  Process  for  Overlapping  Field  of  View. 

only  restriction  necessary  for  the  usefulness  of  this  model  is 
that  a  common  field  of  view  (the  event-state)  be  used.  For 
example,  Di  and  D2  can  actually  be  the  same  data  set  under 
the  model,  while  si  and  S2  could  be  different  sensors. 

Definition  7  (Fusion  Rule)  Let  nr=i  product  cat¬ 

egory  of  data  ( or  feature )  set  categories.  Then  a  fusion  rule 
is  a  functor  S  '  and  Vq  is  the  resulting  data  set 

category. 

The  key  to  this  definition  is  to  realize  that  a  fusion  rule 
(see  Figure  3)  simply  combines  the  inputs  from  a  product 
category  into  a  resultant  data  set  (or  feature  set),  which  is  an 
element  of  a  single  data  (or  feature)  set  category.  There  is 
no  restriction  on  the  output  with  regards  to  being  a  “better” 
output  than  a  system  designed  without  a  fusion  rule. 

We  now  desire  to  show  how  defining  a  fusor  (see  Defini¬ 
tion  9)  as  a  fusion  rule  with  a  constraint  changes  the  sensor 
process  model  into  an  Event-State  Eusion  model.  Con¬ 
tinuing  to  consider  the  dual  sensor  process  in  Eigure  2,  a 
fusion  rule  can  be  applied  to  either  the  data  sets  or  the 
feature  sets.  Given  a  fusion  rule  93  for  the  two  data 
sets  as  in  Eigure  3,  our  model  becomes  that  of  Eigure  5. 
A  new  data  set,  processor,  feature  set,  and  classifier  may 
become  necessary  as  a  result  of  the  fusion  rule  having  a 


Di 


Eig.  3:  Eusion  Rule  on  Category  of  Data  Sets. 

different  codomain  than  the  previous  systems.  The  la¬ 
bel  set  may  change  also,  but  for  the  remainder  of  this  pa¬ 
per  we  are  interested  only  in  a  two  class  label  set,  that  of 
L  =  Li  =  L2  =  {Target,  Nontarget}.  In  a  homogeneous 

(Di,D2)||- :  »D3 

Eig.  4:  Eusion  Rule  Applied  to  Data  Sets. 


Di 


D2 


Eig.  5:  Eusion  Rule  Applied  within  a  dual  sensor  process. 

(or  within)  fusion  scenario,  the  data  sets  (or  feature  sets) 
are  the  same,  Di  =  D2  =  D3.  This  is  true  in  the  case  that 
the  sensors  used  are  the  same  type  (that  is,  they  collect  the 
same  measurements,  but  from  possibly  different  locations 
relative  to  the  overlapping  field  of  view.  In  the  case  where 
the  data  sets  (or  feature  sets)  are  truly  different,  a  composite 
data  set  (and/or  feature  set)  which  is  different  from  the  first 
two  (possibly  even  the  product  of  the  first  two)  is  created  as 
the  codomain  of  the  fusion  rule  functor. 

Now  at  this  point  we  may  consider,  in  what  way  is  the 
process  modeled  in  Eigure  5  superior  to  the  original  pro¬ 
cesses  shown  in  Eigure  2?  One  way  of  comparing  perfor¬ 
mance  in  such  systems  is  to  compare  the  processes’  receiver 
operating  characteristics  (ROC  curves). 

3.1  Developing  a  ROC  Curve 

Setting  aside  the  fusion  process  for  a  moment,  we  focus  on 
the  classification  process 

F 

Assume  that  E  is  a  probability  space,  F  can  be  denoted 
equivalently  as  Pr)  where  Pr  is  a  probability  mea¬ 

sure  and  is  the  associated  a-field.  Recall  that  L  is  a  two- 
class  label  set,  T=target,  N=non-target,  and  L=  {T,N}. 


Finally,  consider  the  hypothetical  “perfect”  classiher  c*,  the 
classiher  which  always  matches  a  feature  element  with  the 
correct  label.  Subjecting  our  processes  to  tests  we  can  run 
a  collection  of  features  through  the  classiher  and  produce  a 
corresponding  label.  Given  a;  €  F  and  using  the  inverse 
image  of  the  classiher  we  can  calculate  the  hit  rate, 

Pr{a:|x€c-i(T)Aa:€c*-^(T)} 

Pr{x|a:Gc*-i(T)} 


and  the  false  alarm  rate. 


P/p  ~ 


Prja:  |  a;  G  c  ^(T)  A  x  G  c*  ^(N)} 
Pr{a;  |  x  G  c*“i(N)} 


(2) 


The  ordered  pair  (P/p,P(p)  G  [0, 1]  x  [0, 1]  is  the  ROC 
for  the  system.  Now  it  is  desirable  for  a  classifying  sys¬ 
tem  to  have  a  parameter  associated  with  the  classiher,  such 
that  changing  the  parameter  (which  is  possibly  multidi¬ 
mensional)  changes  the  ROC.  In  such  a  case,  a  parame¬ 
ter  set  0  would  be  chosen  such  that  the  associated  clas¬ 
siher  family  {ce}e^e  continuously  maps  the  feature  set 
into  the  label  set  in  a  bijection,  and  such  that  the  curve 
/  =  (P/p(c,),P  tp{ce))  is  the  projection  of  the  trajectory 
/  =  {9,Pfp{ce),Ptp{ce))  into  the  P/p  -  Ptp  plane.  In 
this  case  we  have  that 


Ptp{0) 


Prja:  |  a;  G  c^^(T)  A  a;  G  c*-^(T)} 
Prja::  |  x  G  c*“i(T)} 


(3) 


and 


P/p(^) 


Pr{a;  |  x  G  c^^T)  A  x  G  c*-i(N)} 
Pr{x  I  X  G  c*“i(N)} 


(4) 


Call  such  a  parameter  set  an  admissible  parameter  set.  Note 
the  parameter  need  not  necessarily  be  associated  with  the 
classiher  of  the  system,  but  could  be  associated  instead  with 
the  sensor(s),  processor(s),  or  any  combination  of  the  three. 
What  is  key  is  that  the  hnal  parameter  set  must  produce  a 
corresponding  ROC  curve  as  a  continuous  curve  from  (0,0) 
through  (1, 1)  in  the  P/p  —  Ptp  plane  as  the  example  in  Fig¬ 
ure  6  shows.  The  parameter  0  is  the  threshold  of  the  ROC. 
Is  there  a  threshold  among  a  particular  family  of  classihers 
that  performs  best?  It  is  well-known  and  accepted  that  the 
threshold  for  which  the  probability  of  a  misclassihcation 
(or  Bayes  error)  is  minimized  is  considered  best  and  de¬ 
noted  the  Bayes  optimal  threshold  (BOT).  That  is,  if  0*  is 
the  solution  to  the  problem 

min  [Pr{x  G  F  :  (x  G  c^^(T)  A  x  G  c*“^(N)) 
V(xGc^HN)AxGc*-1(T))}] 

=  min  [Pr{x  G  F  :  (x  G  Cg  ^(T)  A  x  G  c*“^(N))} 

0 

-fPr{(x  G  c"^(N)  A  X  G  c*-i(T))}] 

=  min  [P/p(0)Pr(N)  +  (1  -  Ptp(0))Pr(T)]  (5) 


where  Pr(T)  and  Pr(N)  are  the  prior  probabilities  of  a  tar¬ 
get  class  and  non-target  class,  respectively,  then  0*  is  the 
BOT  for  the  family  of  classihers  {cg}g^Q. 


ROOcun/e  of  tvvo  norrruldi&tnbutbn^ 


Fig.  6:  A  Typical  ROC  Curve 


An  obvious  question  at  this  point  is  given  two  families  of 
classihers,  {allege  and  {^TrlTren,  which  classiher  is  best? 
This  is  not  an  easy  problem  as  seen  in  [6].  It  is  tempting 
to  use  some  measure  of  the  BOT,  but  notice  that  the  BOT 
is  dependent  upon  the  selection  of  prior  probabilities.  The 
priors  are  generally  not  known,  so  selection  of  a  better  clas¬ 
siher  based  on  ROC  curves  may  not  be  possible,  since  ROC 
curves  for  different  families  can  overlap.  Rather,  we  should 
ask  the  question,  given  an  operating  threshold  of  prior  prob¬ 
abilities,  such  as  Pr(T)  =  4,  can  we  choose  among  com¬ 
peting  classiher  families  one  that  is  superior  to  the  others? 
One  way  to  answer  the  question  is  derived  in  a  very  unex¬ 
pected  way. 

3.2  A  Variational  Calculus  Solution  to 

Determining  the  Bayes  Optimal  Threshold  of 
a  Classifier  Family 

We  will  only  consider  ROC  curves  that  are  smooth  (dif¬ 
ferentiable)  over  the  entire  range,  i.e.,  given  a  ROC  curve 
/,  /  G  C^([0, 1]).  Given  a  diagram  describing  the  fam¬ 
ily  of  classihers  {cg}g^Q,  0  an  admissible  parameter  set, 
(F,.^,  Pr)  being  a  probability  space  of  feature  vectors, 
and  0  an  admissible  parameter  set,  there  is  then  a  graph 
G  =  {(6»,  Pfp{0),Ptpl0)  :  6»  G  0}  which  we  call  the  ROC 
trajectory.  The  projection  of  the  ROC  trajectory  onto  the 
P/p  -  Ptp  plane,  /  =  {(P/p(6»),  Ptpj^))  :  0  G  &},  is  the 
ROC  curve  of  the  classiher  family.  Hence  for  h  G  [0,1] 
such  that  h  =  P fp{0)  for  some  0  G  0,  we  have  that 
[Pfp\~^{h)  =  0.  It  is  now  clear  that  the  BOT  of  the 
classiher  family  {cg}g^0,  0*,  corresponds  to  some  point 
h*  —  Pfp{0*)  G  [0, 1].  So  what  can  we  learn  about  h*7 
Consider  the  problem  stated  as  follows: 

Among  all  smooth  curves  whose  endpoints  lie  on 
the  point  (0,1)  and  the  ROC  curve  y  =  f{h),  hnd 
the  curve  for  which  the  functional 

JM  =  f  [a  +  P\y' {t)\]dt 

Jo 


(6) 


has  a  minimum  subject  to  the  constraints: 

J/(0)  =  0 

y{h)  =  Ptp(0)  (7) 

where  h  =  P fp{6)  for  some  6  G  Q  and  [3  = 

\  —  a  with  a  =  Pr(N),  the  prior  probability  of 
no  target. 

This  functional  is  finding  the  curve  with  the  smallest 
weighted  Manhattan  distance  from  the  point  (0, 1)  to  the 
ROC  curve.  The  constraints  show  that  the  curve  must  be¬ 
gin  at  (0, 1)  and  terminate  on  the  ROC  curve.  Any  solution 
to  Equation  6  must  solve  Euler’s  equation  [7] 

T.  -  =  0.  (8) 

where  T  =  a  +  j3\y'{t)\,  so  that  Tj,  =  0  and  Ty>  = 
Psign{y' (t)).  Hence  we  have  that 

-^sign{y'{t))  =  0  (9) 

at 

so  that  sign{y'{t))  is  constant  for  all  t  G  [0,1].  Thus 
sign{y'{t))  can  be  0  or  —1  since  the  curve  has  the  con¬ 
straints  of  the  endpoints  (0, 1)  and  a  point  on  the  ROC  curve 
/.  Now  if  sign{y'{t))  =  0  for  all  t,  then  y(0)  =  y{h)  = 
2/(1)  due  to  the  smoothness  of  the  ROC  curve.  Thus  Equa¬ 
tion  6  becomes 

i[y^^  =  ah  =  Pv{N)Pfp{e),  (10) 

with  P fp{6)  =  1.  Thus  Pr(A^)  =  1  and  the  weighted  man- 
hattan  length  of  curve  y  is  therefore  1 .  On  the  other  hand,  if 
sign{y'{t))  =  —  1,  then  solving  Equation  6  directly  yields 

at\l=o  +  [P{sign{y' {t)))y{t)]lZo  (11) 

which  reduces  to 

P/p(0)Pr(N)  +  (1  -  Ptp(0))Pr(T).  (12) 

Notice  that  Equation  12  is  identical  to  the  unminimized 
Equation  5.  Therefore,  h  =  h*  which  minimizes  Equa¬ 
tion  12  corresponds  to  the  EOT,  6*,  of  the  family  of  classi¬ 
fiers!  The  transversality  condition  of  the  variation  is 

a  +  P\y' {t)\]t=h- 

+  -  y'{t)){signy'{t))]t=h*  =  0  (13) 


make  the  EOT  very  easy  to  find  given  the  graphing  capabil¬ 
ities  of  today’s  computers,  especially  when  the  parameter 
set,  0,  is  multidimensional.  This  gives  us  an  idea  of  what 
would  make  a  good  functional  for  determining  which  clas¬ 
sifier  families  are  more  desirable  than  others.  An  imme¬ 
diate  approach  would  be  to  choose  a  preferred  prior  ratio 
and  locate  the  EOTs  for  each  competing  classifier  family. 
Since  all  the  EOTs  will  have  the  same  slope  for  lines  tan¬ 
gent  to  their  ROC  curves  at  that  point,  the  EOT  with  the 
tangent  line  closest  to  the  point  (0,1)  would  be  considered 
the  best  choice.  However,  it  is  still  possible  that  many  ROC 
curves  could  be  constructed  so  that  the  EOT  for  each  one 
has  the  same  tangent  line.  This  would  set  up  a  rather  large 
equivalence  class  of  classifier  families.  This  is  the  same 
problem  faced  when  using  area  under  the  curve  (AUC)  of 
a  ROC  curve  as  a  functional.  In  both  cases  the  underlying 
posterior  conditional  probabilities  are  unknown  and  there 
are  just  too  many  possible  combinations  of  posterior  distri¬ 
butions  that  can  produce  ROC  curves  with  the  same  AUC 
(or  EOT  tangent  lines). 

3.3  A  Functional  for  Comparing  Classifier 
Families 

So,  what  criteria  is  best  in  selecting  from  among  compet¬ 
ing  classifiers?  We  submit  that  first  of  all,  among  all  ROC 
curves  representing  the  competing  classifier  families,  iden¬ 
tifying  the  the  EOT  for  each  ROC  is  most  important,  since  it 
is  this  threshold  which  minimizes  the  corresponding  Eayes 
error.  We  can  easily  identify  this  point  on  a  ROC  curve 
presupposing  only  the  prior  probabilities  Pr(N)  and  Pr(T), 
as  demonstrated  earlier.  Furthermore,  our  decision  objec¬ 
tive  is,  in  addition  to  minimizing  Eayes  error,  to  minimize 
P  fp  while  simultaneously  maximizing  Ptp.  The  supremum 
EOT  among  all  ROC  curves  would  be  the  point  (0, 1),  so 
we  can  codify  the  decision  objective  mathematically. 

Definition  8  (ROC  Functional)  Let  {cgjeee  be  a  classi¬ 
fier  family  with  an  admissible  parameter  set  0.  Let  f  be 
the  corresponding  ROC  curve.  Given  data  T  =  (ao, 
where  q;o,/3o  ore  acceptable  levels  for  P fp{6),Ptp{9)  re¬ 
spectively  and  Pt{N)  =  7,  determine  the  point  on  the 
ROC  curve,  (P^p(0*),  Pip(0*)),  as  the  right  endpoint  of 
the  smooth  curve  y  =  y  which  minimizes  the  functional: 

JM  =  [  {a-\-  P\y'\)dt 

Jo 


so  that 


which  is 


a 

p 


Pr(N) 
Pr(T)  ■ 


(14) 


So  the  transversality  condition  tells  us  that  the  EOT  of  a 
family  of  classifiers  corresponds  to  a  point  on  the  ROC 
curve  which  has  as  a  derivative  the  prior  ratio  ! 

Therefore,  if  one  presumes  a  prior  ratio  of  1,  then  the  point 
on  the  curve  corresponding  to  the  EOT  will  have  a  tangent 
to  the  ROC  curve  with  slope  1 .  For  many  problems  this  will 


subject  to  the  constraints  t/(0)  =  1  and  y{h)  =  f{h)  where 
h  =  P fp(0)  for  some  0  G  0.  Call  the  minimized  right 
endpoint  {h* ,  f{h*))  =  (P fp{9*),Ptp{0*)).  Let 


F{f) 


0  if  P/p  >  ao  or  Ptp  <  Po 
1-k  otherwise 


where 

k  =  JPfp{9*f  +  {l-Pty{9*)Ydt. 


Call  the  functional  F(  ■  ;  ao,  Po,y)  the  ROC  functional. 


The  ROC  functional  satisfies  the  requirements  we  set  forth 
in  our  decision  objectives.  Taking  the  Euclidean  distance 
between  the  point  (0, 1)  to  the  point  on  the  ROC  corre¬ 
sponding  to  its  system’s  EOT  also  allows  us  to  make  a  bet¬ 
ter  preference  from  among  ROC  curves,  when  more  than 
one  curve  contains  a  EOT  with  the  smallest  weighted  Man¬ 
hattan  distance  from  the  point  (0, 1). 

Now  given  a  finite  collection  of  competing  classifier  fam¬ 
ilies 

B  =  {6i  =  {bg}e^e^,b2  =  {be}g^e^, . . .  ,bn  =  {bg}e^Q^} 

where  {©i,  ©2,  ■  •  ■ ,  ©n}  is  a  collection  of  admissible  pa¬ 
rameter  spaces,  we  say  that  for  fixed  data  (0,  /Sqj  7o)^ 

b,hb,  ^  T’(/b.;0,/3o,7o)>i"(/6,;0,/3o,7o).  (15) 

In  this  way  we  have  established  a  partial  order  on  the  set  B 
of  competing  classifiers.  Similarly,  since  there  is  a  ROC 
curve  associated  with  each  classifier  family,  we  say  that 

fbi  hROC  fbj  F{fb.;0,  /3o,Jo)  >  Fifbj'iO,  Po^lo)- 

(16) 

4  Fusors 

We  are  now  in  a  position  to  define  a  system  in  which  we 
can  compete  fusion  rules.  Suppose  we  have  a  system  such 
as  that  in  Figure  2.  Each  branch  has  a  ROC  curve  that  can 
be  associated  with  the  classifier  family,  and  we  now  have 
a  viable  means  of  competing  each  branch.  If  we  can  only 
choose  among  the  two  event-decision  systems,  take  the  one 
whose  associated  ROC  functional  is  greater.  Therefore,  we 
can  also  compete  these  two  event-decision  systems  with  a 
system  that  fuses  the  two  data  sets  (or  the  feature  sets  for 
that  matter)  by  fixing  a  third  classifier  family  and  finding  the 
ROC  functional  of  the  event-decision  system  corresponding 
to  the  fused  data  (features).  If  the  fused  system’s  ROC 
functional  is  greater  than  either  of  the  original  two,  then  the 
fusion  rule  is  in  fact  a  fusor.  Repeating  this  process  on  a 
finite  number  of  fusion  rules,  we  discover  a  finite  collection 
of  fusors  with  associated  ROC  functional  values.  The  fusor 
that  is  the  best  choice  is  then  selected  by  finding  the  fusor 
corresponding  to  the  largest  ROC  functional  value. 

Do  you  want  to  change  your  a  priori  probabilities?  Sim¬ 
ply  adjust  7  in  the  ROC  functional’s  data  and  recalculate 
the  EOTs  for  each  system.  Then  calculate  the  ROC  func¬ 
tional  for  each  corresponding  ROC  and  choose  the  largest 
value.  The  corresponding  fusor  is  then  the  best  fusor  to 
select  under  your  criteria.  We  have  for  each  set  of  ROC 
functional  data  and  each  finite  collection  of  fusion  rules,  a 
partial  ordering  of  fusors. 

Definition  9  (fusor)  A  fusor  is  a  fusion  rule  of  an  event- 
decision  process  which  performs  by  means  of  a  functional 
on  its  corresponding  ROC  curve  better  than  any  branch  of 
the  graph  of  the  original  processes  before  applying  a  fusion 
rule. 


Ey  way  of  example,  suppose  we  start  with  the  system 


and  consider  a  functional  F  on  the  ROC  curves  and 
fc2  (F  being  created  under  the  assumptions  and  data  of  the 
researcher’s  choice).  Then  given  fusion  rules  (H  and  T  such 
that 


Di 


©2 


let  /tH  and  fi  refer  to  the  corresponding  ROC  curves  to 
each  of  the  fusion  rule’s  systems  (as  a  possible  example 
of  ROC  curves  of  competing  fusion  rules  see  Figure  7  ). 
Then  we  have  that  if  F{f^)  >  F{fcf)  for  i  =  1,  2  and  if 
F{f%)  >  F{fc-)  for  i  =  1,2  then  we  say  that  IH,  T  are 
fusors.  Furthermore,  suppose  F{f^)  >  F{f%).  Then  we 


Fig.  7:  ROC  curves  of  Competing  Fusion  Rules 

have  that  91  Froc  Thus,  91  is  the  fusor  a  researcher 
would  select  under  the  given  assumptions  and  data. 


5  Conclusion 


A  fusion  researcher  should  have  a  viable  method  of  com¬ 
paring  fusion  rules.  It  is  required  to  define  fusion  correctly, 
and  to  demonstrate  to  the  scientific  community  improve¬ 
ments  over  existing  methods.  We  have  shown  in  this  pa¬ 
per  that  every  fusion  system  can  generate  a  corresponding 
ROC  curve,  and  under  a  mild  assumption  of  smoothness  of 
the  ROC  curve,  a  Bayes  Optimal  Threshold  (BOT)  can  be 
found  for  each  classifier  family.  Given  additional  assump¬ 
tions  on  the  a  priori  probabilities  of  a  target  or  non-target, 
along  with  given  thresholds  for  P fp  and  Ftp,  a  functional 
can  be  generated  which  will  yield  a  real  value  for  each  ROC 
curve.  This  functional  called  the  ROC  functional  will  gen¬ 
erate  a  partial  order  of  classifier  families,  fusion  rules,  and 
ultimately  fusors,  which  can  then  be  used  to  select  the  best 
fusor  from  among  a  finite  collection. 

Future  research  in  this  area  will  include  looking  for  dif¬ 
ferent  functionals  which  may  be  of  interest  to  researchers, 
considering  fusion  systems  with  greater  than  two-class  la¬ 
bel  sets  as  the  end  result,  and  robustness  of  classifiers  and 
fusors.  Also,  more  research  must  be  done  in  lessening  the 
assumption  of  smoothness  in  the  ROC  curve  since  many 
ROC  curves  can  only  be  approximated. 
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